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Abstract 



The direct product problem is a fundamental question in complexity theory which seeks to 

understand how the difficulty of computing a function on each of k independent inputs scales 

with k. We prove the following direct product theorem (DPT) for query complexity: if every 

T-query algorithm has success probability at most 1 — e in computing the Boolean function / 

Si^ ' on input distribution fi, then for a < 1, every aeT k-query algorithm has success probability 

\^ . at most (2"'^(1 — e))*^ in computing the fc-fold direct product f^'^ correctly on k independent 

^ ' inputs from ^. In light of examples due to Shaltiel, this statement gives an essentially optimal 

O . tradeoff between the query bound and the error probability. As a corollary, we show that for an 

absolute constant a > 0, the worst-case success probability of any ai?2(/)fc-query randomized 

algorithm for f^'^ falls exponentially with k. The best previous statement of this type, due to 

K* ' Klauck, Spalek, and de Wolf, required a query bound of 0{bs{f)k). 

"^ . The proof involves defining and analyzing a collection of martingales associated with an 

algorithm attempting to solve f®^. Our method is quite general and yields a new XOR lemma 
^^ . and threshold DPT for the query model, as well as DPTs for the query complexity of learning 

tasks, search problems, and tasks involving interaction with dyamic entities. We also give a 
version of our DPT in which decision tree size is the resource of interest. 
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1 Introduction 

1.1 Direct product problems 

Suppose some Boolean function f{x) on n input bits is 'hard to compute' for a certain computational 
model. It seems that computing the /c-tuple /®*^(x^, . . . , x'') := (/(x^), . . . , f{x^)) on independent 
inputs x^ , . . . ,x should be 'even harder'. The intuition is that the k tasks to be performed appear 
separate and unrelated, and that with more tasks there are more chances to make a mistake. One 
way to make this idea more precise is the so-called direct product problem. In this approach, we try 
to prove statements of the following form: 

Suppose every algorithm using resources at most T has success probability at most p 
in computing f . Then, every algorithm using resources at most T' has success probability 
at most p' in computing f®^ on k independent inputs to f . 

Such a result is called a direct product theorem (DPT). The direct product problem can be contrasted 
with a second, related question, the direct sum problem, which studies how the complexity of solving 
k instances of a problem scales with k, when we are only interested in algorithms which succeed 
with high probability (or probability 1). For a recent overview of the direct sum problem in query 
complexity, and proofs of some new results, see [JKSlOj . 

Depending on the computational model and our interests, T and T' might measure time, com- 
munication, or any other resource. The success probability could be with respect to some input 
distribution /i, in which case it is natural to assume in the A;-fold setting that the inputs are drawn 
independently from /x; we call this the average-case setting. However, one can also consider the case 
where p is a bound on the worst-case success probability of a randomized algorithm, ranging over 
all inputs to /; we then try to establish an upper-bound p' on the worst-case success probability of 
query-bounded algorithms for f®^. The strength of a direct product theorem can be measured in 
terms of the dependence of the parameters T' ,p' on T,p, k, and, possibly, on the function / itself. 
We want T' to be large and p' to be small, to establish that the /c-fold problem is indeed 'very 
hard'. 

There is also an important variant of the direct product problem, in which we are interested in 
computing the '/c-fold XOR' f®^{x^, . . . ,x^) := f{x^) © ... © /{x'^) of k independent inputs to /. 
An XOR lemma is a result which upper-bounds the success probability p' achievable by algorithms 
for Z®'^ using T' resources, under the assumption that any algorithm using T resources has success 
probability at most pa An obvious difference from DPTs is that in an XOR lemma, p' must always 
be at least 1/2, since /®'^ is Boolean and the algorithm could simply guess a random bit. The 
hope is that {p' — 1/2) decays exponentially with k. Research on XOR lemmas has proceeded in 
parallel with research on direct product theorems; the known results are of similar strength (with 
some exceptions), and in some cases there are reductions known from XOR lemmas to DPTs or 
vice versa (see |Ung09[ IIK10| for an overview and recent results of this type). 

The direct product problem has been studied extensively in models including Boolean cir- 



cuits (e.g., |(^NW95llIW97llLTKWinO . communication protocols \IBWM\ IShan3[ iKSdWOTJ iLSSnS 



IVW08] . and query algorithms |IRW94[ INRS99[ ISha03l lKSdW07| . In all of these models, an opti- 



mal T-bounded algorithm which attempts to compute / can always be applied independently to 



^Terminology varies somewhat in the Uterature. For instance, what we call XOR lemmas are called 'direct product 
theorems' in [Sha03] . and what we refer to as direct product problems are in |Sha03] called the 'concatenation variant'. 



each of k inputs, using at most T' = Tk resources and succeeding with probabihty p' = p^ , so 
these are the 'ideal', strongest parameters one might hope for in a DPT. However, direct product 
statements of such strength are generahy false, as was shown by Shaltiel |Sha03j . who gave a family 
of counterexamples which applies to all 'reasonable' computational models. We will describe these 
examples (specialized to the query model) in Section HI g 

Thus, all DPTs shown have necessarily been weaker in one of several ways. First, researchers 
have restricted attention to algorithms of a special form. Shaltiel |Sha03j showed a DPT with the 
'ideal' parameters above holds for the query model, if the algorithm is required to query each of 
the k inputs exactly T times. He called such algorithms 'fair'|f| A similar result for a special class 
of query algorithms called 'decision forests' was shown earlier by Nisan, Rudich, and Saks |NRS99] . 

Second, DPTs have been shown for unrestricted algorithms, but using resource bounds whose 



strength depends on properties of the function /. For example, Klauck, Spalek, and de Wolf |KSdW07 
showed that for any / and any 7 > 0, a DPT holds for / in which the achievable worst-case success 
probability p' is at most (1/2 + 7) , provided T' < a ■ bs{f)k for some constant a = 0(7) > 0. 
Here bs{f) is the block sensitivity of / |Nis89l [BdW02j . a complexity measure known to be related 
to the randomized query complexity by the inequalities R2{fY'^ < ^^{f) ^ -^2(/) (suppressing 
constant factors). Now, one can always compute / correctly on k instances with high probability 
using 0{R2{f)klog k) queries. For many functions, including random functions, bs{f) = 0(i?2(/)) 



and so the DPT of JKSdWOT gives good results. However, examples are known |BdW02J where 
bs{f) = 0{-\/R2{f))-, so the number of queries allowed by this DPT can be significantly less than 
one might hope. 

Klauck, Spalek, and de Wolf also proved DPTs for quantum query algorithms computing /, in 
which the success probability p' drops exponentially in k if the number of allowed quantum queries 
is 0{y^bs{f)k). Spalek [SOSJ proved a DPT for quantum query algorithms where the resource 
bound T' scales in terms of a complexity measure called the multiplicative quantum adversary. The 
ultimate strength of this result is not yet clear since the relationships between the multiplicative 
adversary and other complexity measures are not well-understood. Finally, for symmetric functions, 
direct product theorems of a strong form were proved for quantum query complexity by Ambainis, 
Spalek, and de Wolf |ASdW09| . 

We have surveyed results in the query model, but in other models, such as communication 
complexity, our earlier remarks also apply: known direct product theo rems in w hich the allowed 
communication T' scales with k either apply to specific functions (e.g., in |KSdW07) a direct product 
theorem was proved for the quantum communication complexity of the Disjointness function, and 
a classical analogue was proved by Klauck |Kla09j ) . or else require the allowed resources T' to 
scale as D{f)k, where !'(/) is a complexity measure which can be significantly smaller than the 
resources needed to compute a single instance of /. For example, in communication complexity, 
DPTs have been shown whose strength is related to the so-called discrepancy of / |Sha03[lLSS08] . 
In the Boolean circuit model, despite intensive study, the known results are quantitatively much 
weaker, and in particular require T' to shrink as k grows in order to make the success probability 



^Shaltiel calls a DPT 'strong' if it applies to all p,T and its parameters satisfy p' < p"(''' and T' > Q{Tk). His 
counterexamples rule out strong DPTs for 'reasonable' computational models. In later works, the modifier 'strong' 
has been used in a somewhat broader way. We will not use this terminology in the present paper. 

^Technically, Shaltiel proved, in our terms, an optimal XOR lemma for fair algorithms, but as he noted, this im- 
plies an optimal DPT, and his proof method can also be modified to directly prove an optimal DPT for fair algorithms. 



•p' decay as k grows (although if T' is allowed to shrink with A:, a DPT with p' = 'p^ can be shown 
using |Inip95 IHol05| , as remarked in [IK10| ) . 



1.2 Our results 

Our main result is the following direct product theorem in the average-case setting: 

Theorem 1. Suppose f is a Boolean function and fj, is a distribution over inputs to f, such that 
any T-query randomized algorithm has success probability at most (1 — e) in computing f on an 
input from fi. Then for a € (0,1], any randomized algorithm making aeTk queries has success 
probability at most (2"'^(1 — e)) < (1 — e + .SAae) in computing f® correctly on k inputs drawn 
independently from fi. 

We use Shaltiel's examples to show that the tradeoff in Theorem [1] between the query bound and 
the error probability is essentially best-possible, at least for general functions / and for small values 
a < .001. (For specific functions, the success probability will in some cases decay exponentially 
even when the number of queries allowed scales as Tk rather than eTk.) Theorem [1] reveals that 
small values of e, as used in Shaltiel's examples, are the only major 'obstruction' to strong, general 
direct product statements in the query model. 

As a corollary of Theorem^we obtain the following DPT for worst-case error, which strengthens 
the worst-case DPT of [KSdWOT] mentioned earlier: 

Theorem 2. For any Boolean function f and < 7 < 1/4, any randomized algorithm making at 
most 7'^i?2(/)^/ll queries has worst-case success probability less than (1/2 -|- 7) " in computing f®'^ 
correctly. 

It seems intuitive that a statement like Theorem [2] should be true, and proving such a DPT was 
arguably one of the major open problems in classical query complexity. 

We also prove a new XOR lemma. Let B^^p denote the binomial distribution on k trials with 
success probability p. 

Theorem 3. Suppose that any T-query randomized algorithm has success probability at most (1 — e) 
in computing the Boolean function f on an input from fi. Then for < a < 1, any randomized 
algorithm making aeTk queries and attempting to compute f® on k inputs drawn independently 
from fj, has success probability at most 

i(l+ Pr [Y>{l-ae)k] 

which is less than M 1 -|- [1 — 2e -|- 21aln(2/Q)e] 

Compare the probability bound above with the success probability ^(1 -|- (1 — 2e)'^), which can 
be attained using Tk queries by attempting to solve each instance independently and outputting 
the parity of the guessed bits. The concrete estimate given in Theorem [3] is meant to illustrate how 
our bound approaches this value as a ^- 0; by a more careful use of Chernoff inequalities, one can 
get somewhat tighter bounds for specific ranges of a, e. An XOR lemma for the worst-case setting 
can also be derived from our result. 



In addition to our 'ordinary' DPT (Theorem [T]), we also prove a 'threshold' DPT, which bounds 
the probability that a query-bounded algorithm for /® solves 'too many' of the k instances cor- 
rectly. As one special case, we prove: 

Theorem 4. Let f be a (not-necessarily Boolean) function such that any T-query algorithm has 
success probability at most 1— e in computing f on an input sampled from fi. Fix any r],a £ (0, 1]. 
Consider any randomized algorithm, IZ making at most aeTk queries on k independent inputs from 
fi. Then the probability that TZ computes f correctly on at least r]k of the inputs is at most 

Pr [Y>{j^-ae)k]. 

Using Chernoff inequalities, Theorem |4] gives success bounds which decay exponentially in k for 
any fixed a, e, ry, provided 77 > 1 — e + ae. As we will explain, Shaltiel's examples show that this 
cutoff is nearly best-possible. By setting 77 := 1 in Theorem (H we also get an ordinary DPT for 
non-Boolean functions, which in general is stronger than the DPT we'd get by a straightforward 
generalization of our techniques for Theorem [TJ This is the simplest way we know to get such a 
DPT. 

Threshold DPTs have been proved for a variety of models. Unger |Ung09| shows how to derive 
threshold DPTs from XOR lemmas, and recent work of Impagliazzo and Kabanets [IKlOj gives 
a way to derive threshold DPTs from sufficiently strong DPTs (see also the earlier works cited 
in |Ung09 IIKIOJ ). However, the results of [IKlOj do not apply for our purposes, and the threshold 



DPT we prove is more general than we'd get by applying the results of |Ung09| to our XOR lemma. 
In any case the proof of our threshold DPT is, we feel, quite natural (and actually forms the basis 
for our XOR lemma). Our method applies to very general threshold events: we give bounds on 
the probability that the set S C [k] of instances solved correctly by a query-bounded algorithm is 
'large', in a sense specified by an arbitrary monotone collection A of subsets of [k]. Generalized 
threshold DPTs of this form were shown recently by Holenstein and Schoenebeck jHSlOj in the 
circuit model, for a rich class of computational tasks called 'weakly verifiable puzzles' (as usual in 
the circuit model, these DPTs require T' to shrink with k). Our techniques appear unrelated to 
theirs. 

We also prove new DPTs for relation problems (for which direct sum theorems were proved 
recently by Jain, Klauck, and Santha jJKSlOj ). learning tasks, search problems, and errorless 
heuristics, as well as a DPT in which decision tree size, rather than depth, is the resource of 
interest. A DPT for decision tree size was shown previously in |IRW94j . which gave an 'ideal' 
success probability decay p' = p^, but in the case where the size is not allowed to scale with k, i.e., 
the setting T' = T. By contrast, in our DPT, the success probability decays as p^^^' = (1 — e)^ ' , 
while the size bound T' scales as T^'^'^^ Finally, we give a further generalization of our DPTs, in 
which the k objects being queried are dynamic entities rather than static strings. 

In order to ease notation, in this paper we discuss only DPTs for total functions, but our 
results apply to partial functions (functions with a restricted domain) as well; the proofs are the 
same. Similarly, our theorems and proofs carry over without change to handle non-Boolean input 
alphabets, as well as heterogeneous query costs. Taken as a whole, our results provide a fairly 
complete picture of the 'direct product phenomenon' for randomized query complexity (although 
there may be room for improvement in some of our bounds). We hope this work may also help lead 
to a better understanding of the direct product problem in other, richer computational models. 



1.3 Our methods 

We first explain our method to prove our 'basic' direct product tlieoreni, Tlieorem[TJ As mentioned 
earlier, Shaltiel |Sha03j proved an optimal DPT for 'fair' decision trees, in which each of the k inputs 
receives T queries. Our proof method for Theorem [T] also yields an alternate proof of Shaltiel's 
result, and it is helpful to sketch how this works first. (Really, this 'alternate proof is little more 
than a rephrasing of Shaltiel's proof technique, but the rephrasing gives a useful perspective which 
helps us to prove our new results.) 

Suppose that every T-query algorithm for computing / succeeds with probability at most (1 — e) 
on an input from the distribution fi. Consider a fair T/c-query algorithm V for f^^, running on 
k independent inputs from fi. We think of the algorithm as a 'gambler' who bets at k 'tables', 
and we define a random variable Xj^t € [1/2, 1] which represents the gambler's 'fortune' at the j-th 
table after D has made t queries overall to the k inputs. Roughly speaking, Xj^t measures how well 
the algorithm is doing in determining the value of / on the j-th input. When T> queries the j-th 
input, the j-th fortune may rise or fall, according to the bit seen; we regard each bit revealed to be 
generated sequentially at random, conditioned on the bits queried so far. The fortunes are defined 
so that Xj^Q < 1 — e for each j (reflecting the assumed hardness of / on //), and so that no action 
by the algorithm leads to an expected gain in fortunecl It follows that IE[njgffci ^j,Tk] < (1 — e)'^- 
But the fortunes are defined so that IE[njgffcl ■^j,Tk] upper-bounds the success probability of T> in 
computing f^''. This gives the DPT for fair algorithms. 

If D is no longer required to be fair, but instead makes at most aeTk queries, then the individual 
fortune Xj^t we define no longer has the same intuitive meaning after the j-th input has been 
queried more than T times. However, the success probability of P can still be upper-bounded by 
^[nie5"'^i."£T'=]' ■^lisre S is the (random) set of inputs which receive at most T queries. Counting 
tells us that fewer than aek of the inputs can lie outside of S, and each fortune is always at least 
1/2, so the success probability is at most 2°^'^E[J| gr^i Xj^aeTk] < 2"^'^(1 — e)^, giving the statement 
of Theorem [TJ 

Our worst-case DPT for Boolean functions is a straightforward corollary of Theorem [H Our 
DPT for decision tree size requires a somewhat different analysis, in which we track the 'size-usage' 
of each of the k inputs rather than their number of queries, but the basic approach is exactly 
the same as in Theorem [H In generalizing our method to prove our other results, however, we 
face a new wrinkle: the natural definitions of the 'fortunes' Xj^t in these settings are no longer 
bounded from below by 1/2. For example, if / : {0, 1}" — t- B then we have Xj^t > \B\^^ , and a 
straightforward modification of the method described above gives a DPT whose strength degrades 
as |-B| grows. In other settings (e.g., the fc-fold XOR setting), we will only have Xj^t > 0, and the 
method fails completely|f| 

To overcome this difficulty, we adopt a more general perspective. Our previous proof hinged on 
the fact that, if a gambler plays neutral or unfavorable games at k tables with an initial (nontrans- 
ferable) endowment of 1 — e at each table, then the probability he reaches a fortune of 1 at every 
table is at most (1 — e)^. Note, this is just the success probability he would achieve if he followed 



''in standard probabilistic terms, each individual sequence Xj^, Xj_i, ... is a supermartingale. We will not use this 
terminology in the paper. 

^One way to work around the problem is to simply add a small 'buffer term' to the fortunes Xj^t- However, this 
leads to poorer bounds, and does not yield our generalized threshold DPTs. 



an independent 'all-or-nothing bet' strategy at each table. It is natural to wonder whether this 
strategy remains optimal if the gambler wants merely to reach a fortune of 1 at 'sufficiently many' 
tables. Indeed, we show by a simple induction that this is true, where the meaning of 'sufficiently 
many' can be specified by any monotone collection of subsets of [k]. Most of our generalizations 
of Theorem [H as well as our XOR lemma, will follow readily from this handy 'gambling lemma' 
(Lemma [9]). 

1.4 Organization of the paper 

In Section [2] we review preliminaries that are used throughout the paper and that are needed to 
state and prove our 'basic' DPTs, Theorems [T] and [2j We will introduce other definitions as needed 
in later sections. In Section [3] we prove Theorem [H and in Section U] we use Shaltiel's examples to 
analyze the tightness of this result. We prove Theorem [2] in Section [5l 

In Section [6] we prove our 'gambling lemma'. Lemma [9l and use it to prove a generalized 
threshold DPT for relation problems. Theorem [J] will follow as a special case. We also explain how 
our threshold DPT implies a DPT for the query complexity of certain learning tasks. We prove 
Theorem [3l our XOR lemma, in Section [7] (also using Lemma [9]). We define search problems and 
errorless heuristics in Section [8l and give DPTs for these settings. 

We prove our DPT for decision tree size in Section [9j In Section [TOl we describe generalizations 
of our DPTs to settings involving interaction with dynamic entities. We end with some questions 
for future work. 

2 Preliminaries 

All of our random variables will be defined over finite probability spaces. We let supp(X) denote 
the support of a random variable X, i.e., the set of values with nonzero probability. Let fi^^ denote 
k independent copies of distribution fx. 

2.1 Randomized decision trees and query complexity 

A (deterministic) decision tree T> over {0, 1}" is a rooted, full binary tree (i.e., each node has either 
or 2 children), in which interior vertices v are labeled by indices ind(i)) € [n] and leaf vertices 
are labeled by values i.{v) in some finite set B (often B = {0, 1}). The height of T) is the length of 
the longest descending path in T>. T> defines a function fx> '■ {0, 1}" -^ B m. the following way. On 
input X we start at the root and follow a descending path through T>; at interior node v, we pass 
to the left subchild of v if Xi^Mv) = 0) otherwise we pass to the right subchild of v. When we reach 
a leaf vertex v, we output the value (.{v). Any deterministic algorithm to compute / which queries 
at most t bits of x on any input can be modeled as a height-t decision tree, and we will freely refer 
to such a tree as a 'i-query deterministic algorithm'. 

A randomized decision tree is a probability distribution TZ over deterministic decision trees. 
Upon receiving the input x, the algorithm samples T> ^ TZ, then outputs P(a;). (Every randomized 
query algorithm can be modeled in this fashion.) We write 7^(x) to denote the random variable 
giving the output of TZ on input x. We say that 7^ is a t-query randomized decision tree if every 
decision tree in the support of 7^ has height at most t. 



For e G [0, 1] and a function / (not necessarily Boolean), we say that 7^ e-computes f if for 
all inputs x, Pr[7?.(2;) = /(x)] > 1 — £■ Similarly, if /i is a distribution over inputs x G {0, 1}", we 
say that 7^ e-computes f with respect to n if FiCx'^fi[T^{x) = f{x)] > 1 — e, where the probability is 
taken over the random sample x ~ /i and the randomness used by TZ. 

For a function / : {0, 1}" — > B, we define i?2(/)i the 2-sided-error randomized query complexity 
of f, as the minimum t for which there exists a t-query randomized decision tree which 1/3- 
computes /. We define Suc7^^^(/) := 1 — e, where e > is the minimum value for which some 
T-query randomized algorithm 7^ e-computes / with respect to /x. By standard arguments, this 
maximum exists (and is attained by a deterministic height-T decision tree). 

For / : {0,1}" ^ S and A: > 1, define /®^ : {0,1}'=" -^ {0,1}'', the k-fold direct product 
of f, as f^^{x^, ...,x^) := (/(x^), . . . ,/(x'=)). If / is Boolean, define the k-fold XOR of f as 
f®^{x^, . . . ,x^) := /(x^) © ... © fix''), where © denotes addition mod 2. 

2.2 Binomial distributions and Chernoff bounds 

Let i?fc,p denote the binomial distribution on k trials with bias p. That is, Bk^p is distributed as y = 
Si=i ^5 where the Yi are independent and 0/1-valued with Pr[li = 1] = p. For s G {0, 1, . . . , A;} 
we have the explicit formula Pr[y = s] = (^)j5*(l — p) . 

The following form of Chernoff 's inequality will be convenient for us. The proof is in Ap- 
pendix lAl 

Lemma 5. Let 6 G (0, 1), and let Y ~ -6^,1-5. ///? G (0, 1/2], then 

Pr[y > (1 - P5)k\ < [1 - (5 + 21/3 \ii{l/l3)5]^ . 

To apply Lemma m it is helpful to understand the behavior of the function h{x) := xln(l/x). 
This function is increasing on (0,e~^], and as x ^ 0, h{x) approaches only slightly more slowly 
than X itself. For example, if n > 1 we have 

/ 1 \ 1 . , X 1 ln(2nlnn) 1 

h = ln(2nlnn) = ) , n^ < -■ 

\2nlnnJ 2nlnn n In(n^) n 

3 Proof of Theorem [1] 

In this section we prove our 'basic' direct product theorem: 

Theorem 1 (restated). Let f he a Boolean function for which SucT,/i(/) < 1 — e. Then for 
< Q < 1, ^ViC^eTk,^,^Af®'') < (2"^(1 - e))'' <(!-£ + Mae)^ . 

Proof. First we make some simplifying observations. The statement is a triviality if T = or 
e = 0, so assume both are positive. Also, by convexity, it is sufficient to show the statement for 
deterministic algorithms. Finally, by a standard limiting argument, it is enough to prove this result 
under the assumption that supp(//) = {0,1}"; this ensures that conditioning on any sequence of 
query outcomes will be well-defined. 

Next we set up some notation and concepts relating to the computation of / on a single input; 
afterwards we will apply our work to the direct-product setting. 



For a string u € {0, 1, *}", let the distribution ^^^' be defined as a sample from /x, conditioned 
on the event [xi = Ui, Vi such that Ui G {0,1}]. Let |u| denote the number of 0/1 entries in u. 
Let u[xi ^ b] denote the string u with the i-th coordinate set to b. In our proof we consider the 
bits of an input y ~ /U to be generated sequentially at random as they are queried. Thus if an 
input is drawn according to /U, and u describes the outcomes of queries made so far (with * in the 
coordinates that have not been queried), we consider the input to be in the 'state' /i^"'. If some 
index i € [n] is queried next, then the algorithm sees a with probability Pr^^^ (n) [iJi = 0], in which 
case the input enters state /i("l^»^0J); with the remaining probability the algorithm sees a 1 and the 
input enters state ^("I^»^i]). Clearly this interpretation is statistically equivalent to regarding the 
input as being drawn from /x before the algorithm begins (this is the 'principle of deferred decisions' 
of probability theory) . 

For each u £ {0,1,*}", and each deterministic algorithm V on n input bits, let VF(m, D) := 
Pr („)[P(y) = f{y)]. If |n| < T, we define W*{u) := max© VF(ii, P), where the max ranges over 
all deterministic algorithms V making at most (T — |n|) queries. Clearly W*{u) € [1/2, 1], since an 
algorithm may simply guess a random bit. We make two more simple claims about this function. 

Lemma 6. 1. Ty*(*") <l-e. 

2: For any u € {0, 1, *}" with \u\ < T, and any i G [n], W.y^^{u)[W*{u[xi ^ y,])] < W*{u). 

Proof. 1: This is immediate from our initial assumption Sucx^^if) < (1 — e). 

For 2: If the i-th. coordinate has already been queried (i.e., Uj € {0,1}), then y^ = Ui with 
probability 1, so u[xi ■(— yj = n and the statement is trivial. So assume Ui = *. Let Vq, Vi 
be algorithms making at most T — {\u\ + 1) queries and maximizing the success probabilities on 
^(«[x,^o])^ ^(n{xi^i]) respectively; that is, W{u[xi ^ 0],Po) = W*{u[xi ^ 0]) and W{u[xi ^ 
l],T>i) = W*{u[xi -^1]). Consider an algorithm V which queries Xi, then runs T>h if the bit seen 
is b. V makes at most T — \u\ queries, and we compute W{u,'D) = K ri,)[W* {u[xi -^ Vi])]- Thus 
W*{u) is at least this value. D 

Now we prove the Theorem. Let V be any deterministic algorithm making at most M := [aeT/cJ 
queries, and attempting to compute f^'^ on input strings x^, . . . ,x^ sampled from fj,®^. For j G [k] 
and < t < M, let ul G {0, 1, *}"" be the random string giving the outcomes of all queries made to 
x^ after D has made t queries (to the entire input). We need the following important observation: 

Lemma 7. Condition on any execution ofD for the first t >0 steps, with query outcomes given by 
u\, . . . ,u^ . Then the input is in the state ^^"^t) x . . . x ^u'"*'; that is, the k inputs are independent, 
with x^ distributed as fi^'^^t' . 

We include the simple proof in Appendix [Bj 
Define collections 

'^ = {^j,t}je[k],0<t<Ah 7' = {Pt}o<t<M 

of random variables, as follows. All the random variables are determined by the execution of T> 
on an input drawn from fj,®''. Let Xj^t ■= W*{uj) if |n*| < T; otherwise let Xj^t ■= 1/2. Let 

Pt ■■= II j elk] ^j,t- 

We claim that for each < t < M, E[Pt+i] < E[Pt]. To see this, condition on any outcomes 
to the first t queries, described hy uj, ... jU^. Now suppose that for the (t + l)-st query, V queries 



the i-th bit of the j-th. input {i,j are determined hy uj, . . . , u^, since P is deterministic). We note 
that Xj/^t+i = ^j',t for all j' ^ j. If |ti* | > T then also Xj^t+i ^ -Xj,t, which implies Pt+i = Pt- So 
assume |n*| < T. Then we have 

nPt+iW], ■■■.n\]= nXj,t+i ■ n Xo',t+i\n\, ■ ■ ■,u1] 

<X,,f^X,,,t =Pt, 
where we used part 2 of Lemma [6j We conclude 

nPt+i] = nnPt+i\u\, . . .aw < npt], 

as claimed. It follows that E[Pm] < ^Po]■ But we can bound Pq directly: Pq = PF*(*")'= < (l-e)'' 
(LemmaEl part 1). Thus E[Pm] < (1 - e)''- 

Now we argue that this implies an upper bound on the success probability of D. For j € [k], 
define the random variable Aj := Xj^m — 1/2, so Aj € [0, 1/2]. Condition on the bits u\.j, . . . ,«^ 
seen by V during a complete execution. For each j € [A;], there are two possibilities: either |^i]^/| > T, 
or the j-th input is in a final state fi^'^M^ for which Pr j J/(x) = 1] G [1/2 — Aj, 1/2 + A,]. 
Since the k inputs remain independent under our conditioning, the conditional probability that V 
computes J®*^ correctly is at most n,-.uJ |<t(1/2 + ^i) = Ui-iu^ \<t^JM- 

T> makes at most aeTk queries, so simple counting tells us that there are fewer than aek indices 
j for which \u-'j^\ > T. Thus, 

Xj,M < 7 — : 77 r3X < Pm ■ ^ 

J--Kj\<T 

(since Xj^m > 1/2 for all j). Taking expectations, we find that the overall success probability of T> 
is less than E[Pm ■ 2"^^=] < (2"^(1 - e))''. 

Finally, we simplify our bound. We claim 2^ < 1 + .84x on (0,1/2]. To see this, just note 
that 2° = 1, that 2^/2 < 1.42 = 1 + .84(1/2), and that 2^ is a convex function on M. Then, since 
<ae < 1/2, we have 2"^(1 - e) < (1 + .84ae)(l - e) < I - e + Mae. The Theorem follows. D 

We remark that, as claimed in the Introduction, the proof above can be easily adapted to give 
an alternate proof of Shaltiel's optimal direct product theorem for 'fair' algorithms making Tk 
queries: we define the random variables Xj^t exactly as before and note that \ul\ <T for all j,t. 

4 Tightness of the Bounds in Theorem [1] 

In this section we describe a family of functions and input distributions, due to Shaltiel |Sha03j . 
and explain why they show that the query/success tradeoff in Theorem [1] is nearly best-possible, 
at least when a < .001 and when (1 — e) is also at most a small constant. 
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Fixing an integer T > 0, define /t : {0, 1}-^+^ — )• {0, 1} as follows: let fri^) := X2 if xi = 1, 
otherwise frix) := X2 ® . . • © xt+2- Given e S (0, 1/2), let ^^ be the distribution over {0, 1}-^"^^ in 
which all bits are independent, Pr[a;i = 1] = 1 — 2e, and Pr[xj = 1] = 1/2 for alH G {2, . . . , T + 2}. 
Note that if y ~ /x^, a T-query-bounded algorithm can gain no information about the value of / 



when xi = 0, so any such algorithm succeeds with probability at most (1 
computing f{y). 



2e)l + (2e)i = 1-ein 



Now consider the following algorithm D attempting to compute /® on inputs x^, 



,x^ ~ 



l^e 



First T> queries the first two bits of each input. Call an input x^ 'bad' if its first bit is 0, 'good' if 
its first bit is 1. Let B C [k] denote the set of bad inputs. Note that T> learns the value of / on each 
good input. Next, V chooses arbitrarily a set 5 C S of [aefcj bad inputs, and spends T additional 
queries on each input in S to determine the value of / on these inputs (if there are fewer than 
[ae/cj bad inputs, T) queries them all and determines the value of f®^ with certainty). Finally, T) 
outputs the answer bits it has learned and makes random guesses for the remaining values. 

Observe that V uses at most 2k + aeTk queries total. To analyze the success probability of P, 
first consider an algorithm P' which uses only 2k queries to look at the first index of each input, 
outputting the correct value on good inputs and guessing randomly on bad inputs. It is easy to 
see that V succeeds with probability (1 — e) in computing f®^. Also, if V and V are both run 
on a common fc-tuple of inputs drawn from fif^, and we condition on the event that |-B| > [aefcj, 
then the conditional success probability of T) is 2^°^ ^ times the conditional success probability of 
P', since P has [ae/cj fewer random guesses to make. Thus, 



V' succeeds 



B\ > aek 



Pr [V succeeds] > Pr [\B\ > aek] ■ 2^"'^''^ Pr 

= 2L°-'^'J Pr [V succeeds A\B\> aek] 

> 2L"^''J • (Pr [D' succeeds] - Pr [|B| < aek]) 

= 2^^=] . (^(1 - e)'' - Pr [\B\ < aek]'^ . (1) 

Define the indicator variable Yj := Iu^b]; then the Yj's are independent, with p = Pr[l^' = 1] = 
l-2e. Let Y := Yi+. . .+Yk. We apply Lemma[5]to Y, with the settings 6 := 2e and /3 := a/2 < 1/2, 
to obtain 

Pr[\B\ < aek] = Fr[Y > (1 - ae)k] 

= Pr[y > (1 - {2e){a/2))k] 

< [1 - 2e + 21(a/2) ln(2/a)(2e)]'= . 

This can be made less than (1 — 1.5e) if a is a small enough positive constant (a < .001 will work). 
Now if (1 — e)^ is also at most a sufficiently small constant, then (1 — 1.5e)^ < .1(1 — e)^' so 
that, by Eq. ([I]), 

Pr [V succeeds] > .9 • 2L"^'=J (1 - e)^ 

which is close to the maximum success probability allowed by Theorem [T] if P used aeTk queries 
(recall, though, that P uses 2k + aeTk queries). 



11 



5 Proof of Theorem [2] 

We now prove Theorem [2] from the Introduction, our DPT for worst-case error, by combining 
Theorem [1] with a version of Yao's minimax principle |Yao77j . which allows us to convert worst- 
case hardness assumptions in query complexity into average-case assumptions. 

Define R2,sif) a-s the minimum T for which there exists a randomized T-query algorithm which 
computes f{x) correctly with probability at least 1 — 6 for every x. The following is a common 
version of Yao's principle, which can be proved directly using the Minimax Theorem of game theory. 

Lemma 8. Fix < 6 < 1/2 and a Boolean function f . There exists a distribution fig over inputs to 
f, such that every randomized algorithm making fewer than R2,5{f) queries succeeds in computing 
f on Us with probability less than 1 — 5. 

Proof of Theorem\^ Let / be given. Let 5 := 1/2 — 7/2, and let ^ := /i^ be as provided by 
Lemma El so that every algorithm making fewer than i?2,5(/) queries succeeds in computing / on 
/i with probability less than 1 — 5. We apply Theorem [1] to /, //, where we have e > 5 > 3/8. With 
the setting a := 7, we conclude that any algorithm making fewer than a£R2^s{f)k queries succeeds 
with probability less than 

(1 - (1 - .847)5)'= = (1 - (1 - .847)(l/2 - 7/2))' 
< (1/2 + .427 + 7/2)^= 

<(l/2 + 7)' 

in computing f^'^ on inputs x^, . . . ,x^ ~ fi^''. So, the worst-case success probability is also less 
than this amount. 

Now we relate R2sif) to R2{f) by standard sampling ideas. Say TZs is an algorithm making 
i?2,(5(/) queries, which computes f{x) with probability at least 1 — 5 = 1/2 -|- 7/2 on each input. 
Let TZ be the algorithm which given an input x, runs TZsi^) for "i := [3/7^] trials, outputting 
the majority value. For i S [m], define the indicator variable Yi for the event [TZs succeeds on 
the i-th trial], and let y := Yi -|- . . . -|- Y^- Then the probability that TZ{x) outputs an incorrect 
value is at most the probability that Y < K[Y] — ^m/2, which by Hoeffding's inequality is at most 

Thus, R2if) < R2,5if) ■ [3/7^1 < 4fi2,5(/)/7' (using 7 < 1/4). Now 

^^R2{f)k/ll < j{3/8){j^R2{f)/A)k < aeR2,5{f)k, 

so any algorithm making at most 7'^ i?2 (7)^/11 queries has worst-case success probability less than 
(1/2 + -ff in computing f®^. D 

6 Threshold Direct Product Theorems 

In this section we prove our 'gambling lemma'. Lemma [U and use it to prove generalized threshold 
DPTs for relation problems (defined in Section [612]) . This will yield DPTs for non-Boolean functions 
as well as for the query complexity of learning tasks. Further applications of Lemma [9] will appear 
in later sections. 
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Let V{[k]) denote the collection of subsets of [k]. Say that a subcollection A C ^([A:]) is 
monotone if A (^ A,A '^ B implies B & A. Monotone collections play an important role in what 
follows. 

6.1 A gambling lemma 

We now prove a technical result, Lemma below, that will play a key role in the rest of the paper. 
Like the proof of Theorem [H this Lemma's statement is best explained by a gambling metaphor. 
Suppose that a gambler gambles at k tables, bringing an initial endowment of pj £ [0, 1] to the 
j'-th table. He cannot transfer funds between tables, or go into debt at any table; he can only 
play games for which his expected winnings are nonpositive; and the different tables' games use 
independent randomness. However, the gambler can choose which game to play next at each table. 
The gambler wants to reach a fortune of 1 at 'sufficiently many' of the tables, where the meaning 
of 'sufficiently many' is specified by a monotone subset A C ^([A;]). One way the gambler may 
attempt to reach this goal is to simply place an 'all-or-nothing' bet independently at each table; 
that is, at the j-th. table, the gambler wins a fortune of 1 with probability pj, and loses his j-th 
endowment with the remaining probability. The following Lemma states that this is in fact the 
gambler's best strategy. 

Lemma 9. Suppose k,N >1 are given, along with a collection {X,U} of random variables (over a 
finite probability space). Here X = {Xi, . . . , X^}, where for each j G [k], Xj = {-'^j.o, ^j,ij • • • j Xj,n} 
is a sequence of variables in the range [0, 1] (think of Xj^t as the gambler's fortune at the j'-th table 
in the first t steps). U = {Uq,Ui, . . . , U^i^i} is a sequence of random variables taking values over 
some finite set (think of Ut as describing the form and outcomes of all gambles in the first t steps) . 
Assume that for all < t' < t < N, j £ [k], Xj^ti is determined by Ut, and that 'KlXj^t+iWt] < ^j,t- 
Also assume that {Xi^t+i^ ■ ■ ■ j-'^fc.t+i} Q'^e independent conditioned on Ut- 

Then, if Xj^q < pj G [0, 1] for all j € [k], and A is a monotone subset ofV{[k]), we have 

Pt[{j G [k] : X,- jv = l}eA]< Fr[D G A], 

where D C [k] is generated by independently including each j G [k] in D with probability pj . 

Note that we assume the gambler never attains a fortune greater than 1 at any table; this 
restriction is easily removed, but it holds naturally in the settings where we'll apply the Lemma. 

Proof. We use the term '^-success' to refer to the event [{j G [k] : Xj^]\j = 1} G ^] whose probability 
we are bounding. 

We first make a simplifying observation: we claim it is without loss of generality to assume that 
between each consecutive times (i, t + 1), at most one of the fortunes changes, and that the fortune 
subject to change is determined by i. To see this, consider any family X obeying the Lemma's 
assumptions, and 'split' each transition (i, t+1) into a sequence of k transitions, in the j'-th of which 
the j'-th fortune is subject to change (according to the same distribution governing its transition 
between t and t -|- 1 in the original sequence). The Lemma's assumptions continue to hold for this 
modified family of sequences; here we are using our original assumption that {Xi^t+i^ ■ ■ ■ iXk^t+i) 
are independent conditioned on Ut- Also, the probability of ^-success is unchanged. So assume 
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from now on that X obeys this extra assumption, and for < t < A^, let jt G [k] be the index of 
the fortune subject to change between times t and t + 1. 

Fix any A; > 1; we prove the statement by induction on A^ > 1. First suppose A^ = 1, and 
let jo be as defined above. Let 5" C [A;] \ {jo} be the set of indices j / jo for which pj = 1. 
First suppose S a A; then Pr[D (^ A] = 1, since each j G 5" is included in D with probability 1. 
In this case the conclusion is trivially satisfied. Next suppose S U {j} ^ A. In this case, Pr[^- 
success] = 0, and again the conclusion is trivially satisfied. So suppose S ^ A, S U {j} G A, and 
condition on any value Uq = u. Then ^-success occurs iff Xj^^^i = 1. By Markov's inequality, 
Pr[XjQ^i = l\Uo = u] < E.[Xj^^i\Uo = u] < Xj^^ < pj^^ = Pr[D G A]. This proves the statement for 
N =1. 

So let A^ > 1 and assume the statement proved for {1, . . . , A^ — 1}; we prove it for A^. Condition 
on any value Uq = u, and condition further on the value Xj^^i = a G [0,1]. The equalities 
Xj^i = Xjfi < Pj are forced for all j ^ Jq; the residual collection of random variables {Xj^t '■ j G 
[k],l < t < N} U {Ut : 1 < t < N} under our conditioning obey the Lemma's assumptions, along 
with our added assumption; and these sequences are shorter by a step than our initial sequences. 
Thus our induction hypothesis implies that 

Pr[Asuccess|[/o = u, Xj^^i = a] < PrlD^"^ G A], (2) 

where D'^"'' is generated just like D except that jo is now included in D^""' with probability a. 

Let go := Pi'[D \ {jo} G A] and qi := Pr[D U {jo} G A]. Note that qo < Qi, since A is monotone. 
We have 

FrlD^"'^ e A] = {1 - a)qo + aqi. 

Taking expectations over a in Eq. ([2]), 

Pr[Asuccess|[/o = u] < {1 - E[Xjf,^i\Uo = u\)qo + E[Xj^^i\Uo = u] ■ qi 

< (l-Pio)90+Pjo9l 
(since qo < qi and K[Xjg^i\Uo = u] < Xj^^ < pj^) 

= PT[DeA]. 
As u was arbitrary, this extends the induction to A^, and completes the proof. D 

6.2 Application to threshold DPTs 

Now we prove our generalized threshold direct product theorem. First we define relation problems. 
A relation (with Boolean domain) is a subset P C {0, 1}"" x B, for some finite set B. The relation 
is total if for all x G {0, 1}", there exists b (z B such that (x, b) G P. For each total relation P there 
is a natural computational problem: given an input x, try to output a b for which (x, b) G P. Of 
course, computing a function / : {0, 1}" — )• i? is equivalent to solving the relation problem for the 
total relation Pj := {(x,6) : f{x) = b}. 

If 7^ is a (possibly randomized) query algorithm producing outputs in B, and P is a total 
relation, say that TZ e-solves the relation problem P if for all inputs x, Pr[(x,7^(x)) G P] > 1 — e. 
Similarly, if /i is a distribution over inputs x G {0, 1}", we say that TZ e-solves the relation problem 
P with respect to jj, if Prx^fj,[{x,TZ{x)) G P] > 1 — e. Define Suc^'^(P) := 1 — e, where e > is the 
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minimum value for which some T-query randomized algorithm TZ e-solves P with respect to fi. As 
usual, this maximum exists and is attained by a deterministic height-T decision tree. Given any 
randomized algorithm 7^ making queries to k > 1 inputs x = {x^, . . . ,x^) to the relation problem 
P and producing an output in B^, let TZj{x) G S be the j-th value outputted by TZ. 

Given A,B Q [k], define the distance d{A,B) := \{A \ B) U {A \ B)\. Given a set family 
A Q Vdk]), and a real number r > 0, define the strict r -neighborhood of A, denoted Nr{A), as 

Nr{A) := {B : d{A,B) < r for some A € A}. 

We have ^ C A^^ (A) . Note also that if A is monotone then so is Nr (A) . We are now prepared to 
state our generalized threshold DPT: 

Theorem 10. Let P C {0, 1}" x B be a total relation for which Suc^ (P) < 1 — e. Fixing any 
randomized algorithm TZ making queries to inputs x = {x^, . . . ,x^) ~ ^®^ and producing output in 
B , define the (random) set 

5[a;]:={iG[A:]:(a;^7^,•(a;))GP}. 

Suppose TZ is aeTk- query-bounded for some a G (0,1], and A is any monotone subset ofV{[k]). 
Then: 

1. Pr[5[x] ^ A] < |i?|"^ ■ Pt[D G A], where D C [k] is generated by independently including 
each j G [k] in D with probability (1 — e). 

2. Also, for D as above, Pr[S[x] e A] < Pr[D G N^ekiA)]. 

Proof. As in Theorem[Tl we may assume e,T > 0, supp(/i) = {0, l}*^. We have e < 1 — j-B|^^ < 1, 
since P is total and an algorithm may output a random element of B. 

For u G {0,1,*}", and for a deterministic algorithm V on n input bits, let Wp{u,'D) := 
Pi:y^^(u)[{y,'D{y)) G P]. If \u\ < T, define Wp{u) := maxx? Wp(n,I?), where the max ranges over 
all deterministic algorithms P making at most (T — \u\) queries. Then Wp{u) G [|i?|~"^,l]. We 
have the following claim, whose proof follows that of Lemma [6l 

Lemma 11. 1. VF^(*") <l-e. 

2. For any u G {0, 1, *}" with \u\ < T, and any i G [n], Ey^^(u)[Wp{u[xi ^ y^])] < Wp{u). 

Now we prove the Theorem. Let TZ be given; as in Theorem[Tl we may assume TZ is deterministic, 
so call it D instead. Let M := [aeTA;J as before, and recall the random strings ul defined in 
Theorem [H 

Define random variables {Xj^t}je[k],o<t<Mj determined by an execution ofV on inputs {x^, . . . ,x'^) 
H®^, by letting Xj^t '■= Wp{ul) if \ul\ < T, otherwise Xj^t ■= \B\~^. Next, the natural idea is to 
apply Lemma m First, however, we need to extend the sequences for one additional (non-query) 
step. That is, we will define random variables Xj^m+i for each j G [k]. We will use X to denote 
the collection of enlarged sequences. 

Our definition of Xj^m+i depends on whether |ti]^,J| < T, that is, on whether V made at most T 
queries to x^ on the current execution. If |ii]jj-| < T, let Xj^m+i '■= '^[(x^,v(x))eP] ^^ ^^^ indicator 
variable for the event that V solves the relation problem on the j'-th input. If {uiA > T, let 
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Xj^M+1 '■= 1 with probability |i?|~^, and let Xj^m+i '■= with the remaining probability. We let 
each such 'coin-flip' be independent of the others and of {x^, . . . ,x ). 

Define the collection U = {Uq, . . . , Um} hy Ut := {uj, . . . ,u^). We argue that the conditions of 
Lemma [9] are satisfied by {X,IA), with N := M + \. First, for < t' < t < M, Xj^t' is determined 
by Ut as needed (since V is deterministic). For < t < M, there is always at most one index j € [k] 
for which Xj^t+i 7^ ^j,t) s-nd this index is determined by Ut (again since V is deterministic). Thus, 
conditioned on Ut, the variables Xi^t+i, ■ ■ ■ , ^fc,i+i are independent. Using part 2 of Lemma [TT] and 
the fact (Lemma [7]) that x^ ~ //("t) conditioned on Ut, we have E[Xj^i_|_i|C/(] < Xj^t for each j. 

Now consider the final, added step. Condition on any value of Um = {''J-m, ■ ■ ■ ;^m)- Lemma [7] 
tells us that o;^, . . . ,x'^ are independent under this conditioning, and P's outputs are determined by 
Um, so the variables {Xj^m+i} are independent conditioned on Um- If WmI ^ ^ then E[Xj^m+i|^m] < 
^j,M by part 2 of Lemma [TTl If \u\,j\ > T then E[Xj^j\/+i] = \B\"^ = Xj^m- 

Thus the assumptions of Lemma [9] are satisfied, with pj = Xj^ < 1 — e. We conclude that for 
any monotone C C V{[k]), 

Pr[{j e [k] : Xj^N = 1} G C] < Fr[D G C], (3) 

where each j G [k] is independently included in D with probability (1 — e). 

To prove statement 1 of the Theorem, let C := A. Note that ^[a;] and u\^, . . . , u\j are determined 
by X, since T> is deterministic. Condition on any value of x for which S\x\ G A. Under this 
conditioning, if j € [k] satisfies \w'^j\ < T and j € S[x\, then X^jv = 1- On the other hand, if 
\w'm\ ^ -^' then [^j,Ar = 1] holds with probability |-B|~^, and these events are independent for each 
such J. By the query bound on V, there are fewer than aek indices j in our conditioning for which 
\w'j^\ > T. Thus, 

Pr[{j G [k] : Xj^N = 1} e A\S[x] eA]> |S^"^^ 

which in combination with ^ implies 

Pr[S[x] eA]< ISI"^'^ • Pr[D G A], 

as needed. To prove statement 2 of the Theorem, let C := NaekiA) in Eq. ([3]): we find 

Pr[{i G [A:] : X,- jv = 1} G N^ek{A)] < Fr[D G iV,,fc(^)]. (4) 

Arguing as above, 5[x]\{j G [A;] : Xj^n = 1} is always a set of size less than aek, so [S[x] G >l] implies 
[{i G [k] : Xj-^ = 1} G N^skiA)]. Thus, Eq. (gD implies Pr[S[a;] G ^] < Pt[D G iVaefc(>l)]. D 

Part 1 of Theorem 1101 is a proper generalization of Theorem[TJ To see this, just set A := {[A:]}, 
P := Pf, and note that in this case, Pr[D G ^] = (1 — e)*^. As another immediate corollary, we 
obtain the following threshold DPT for relation problems, which specializes to an ordinary DPT 
for this setting (statement 3 in the Theorem below). 

Theorem 12. Let P C {0,1}" x B be a total relation for which Suc^' (P) < I — e. Fix any 

Tj G (0,1]. For any randomized algorithm TZ making queries to inputs x = {x^,...,x'') ~ fi^'^, 
define the (random) set S[x] as in Theorem 1 1 PL Then if TZ is aeTk- query-bounded for a G (0,1], 
we have: 
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1. Pr[|5[a;]| > rik] < l^l"*^^ • Pry^B^^_ Jy > rik], and also 

2. Pr[|5[a;]| > 7]k] < Pry^B,_^_jy > (r/ - ae)k]. 

3. Y>t[\S[x\\ = [k]] < min{|5|"^'=(l - ef ,Vty^b^,^_,[^ > (1 - ae)k]}. If a < 1/2 the second 
bound in the min is at most 

[l-e + 21aln(l/Q)e]^'. 

Proof. Apply parts 1 and 2 of Theorem [TOl with the choice A := {A C [k] : |yl| > T]k}. We 
have Pr[Z) G A] = Pr[Z?i + ... + Dk > r]k], where we define Dj := lyez)]- These 0/1-valued 
variables are independent with bias 1 — e, which gives statement 1. Similarly, Pi[D € NaekiA)] = 
Pi[Di + . . . + Dk > [r] — ae)A;], which gives statement 2. Statement 3 simply combines statements 
1 and 2, under the setting rj = 1. For the final bound in statement 3, we apply Lemma with 
/3 := a, 5 := e. D 

Theorem S] in the Introduction follows from the special case of Theorem 1121 in which P := Pf. 

The success bound |i?|°^'^'(l — e)'^ appearing above can also be derived by an easy modification 
of the proof of Theorem [H in which the condition Xj^t ^ 1/2 we exploit becomes Xj^t ^ l-^l^^- 
When |-B| is large, however, the alternative bound provided in Theorem 1121 will tend to give better 
results. 

Note that part 2 of Theorem I12| in conjunction with Chernoff inequalities, gives success bounds 
which decay exponentially in k for any fixed a, e, rj for which t] > 1 — e + ae. Shaltiel's examples, 
described in Section U show that this cutoff is nearly tight: on those functions, the algorithm 
described in Section H] makes 2k + aeTk queries and (it is easily checked) typically solves about 
(1 — e + .hae)k of the instances correctly. 

Threshold DPTs for the worst-case setting can also be derived from Theorems [10] and [T2t by 
the same reduction to the average-case setting used to prove Theorem [2j 

6.3 Direct product theorems for learning tasks 

Theorems [10] and [12] readily imply direct product theorems for the query complexity of learning 
tasks, as we explain next. Consider the scenario in a randomized algorithm TZ is given query access 
to some unknown function h : {0, 1}" — )■ {0, 1} drawn from some distribution /i over hypothesis 
class 1-i. That is, for any string x, IZ can query the value h{x). The algorithm TZ attempts to output 
a hypothesis h which is 'close' to /i, that is, such that close(/i, h) holds, where close '^ Ti x Ti is 
some symmetric relation (assume close{h,h) always holds). 

This task can be equivalently modeled as the relation problem associated with total relation 

Pn := {{h, h') ■.h,h'£nA close(/i, h')}, 

where h is given in truth-table form as a Boolean string, under the input distribution /i ~ ^. (We 
don't give a membership criterion for P-^ when h ^ Ti; this is unimportant since supp(/x) C Ti.) 

In the A:-fold learning problem associated with 'W,/^, the algorithm has query access to each of 
k functions {hi, . . . , /i^) ~ /i®'^, and the goal is to output guesses hi, . . .hj^ such that close (/ij, hj) 
holds for all or at least 'many' indices j € [k]. This task is equivalent to the /c-fold relation problem 
associated with P-^, and Theorems 1101 and 1121 apply. 
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7 Proof of the XOR lemma 

The proof of our XOR Lemma, Theorem [3] from the Introduction, is modeled on the proof of our 
threshold DPTs, and reuses Lemma [9l 

Proof of Theorem\^ As usual we first set up some preliminaries. For a deterministic algorithm T> 
over n input bits define 

W^{u,V):=2. Vt \V{y) = f{y)]-l. 

If \u\ < T, let W^{u) := max© VF©(u,P), where the maximum ranges over all deterministic algo- 
rithms making at most T — |u| queries. 

Lemma 13. 1. W^i*"^) < 1 - 2e. 

2. For any u € {0, 1, *}" with \u\ < T, and any i € [n], Ey^^(u)[W^{u[xi ^ yj)] < W^{u). 

Lemma [T3] follows immediately from Lemma [6l since W^{u) = 2W*{u) — 1. 

Now we prove the Theorem. As in the proof of Theorem [H we may assume e, T > 0, supp(/i) = 
{0, 1}", and it is enough to prove the success bound for each deterministic aeTk-query algorithm 
V attempting to solve f®^{x^, . . . ,x'') on inputs x^, . . . ,a;'^ ~ fi'^'^. Recall the definitions of ul (for 
j & [k],0 < t < M) from Theorem [H For a deterministic algorithm V define {Xj^t}j(^[k]fi<t<M as 
follows: if \ul\ < r, set Xj^t '■= W^{Uj); otherwise, set Xj^t ■= 0. 

We will extend the random sequences {Xj^t} for one additional (non-query) step, and will let X 
denote our enlarged collection. To set up our extension, we first define random variables bj,rj,aj 
for j G [k], determined by u]^, as follows. Let bj € {0, 1} be defined as the likeliest value of f{y), 
where y ~ /U^^a/) (break ties arbitrarily). Let rj := Pr[/(y) = bj] G [1/2, 1], where again y ~ ^S'^m) . 
Let aj :=2rj - 1 G [0,1]. 

If \u\,j\ > T, set Xj^M+i '■= 0. If instead \u-'j^j\ < T, our random process 'inspects' the actual 
value of the bit f{x^) to help determine Xj^m+i- If /(^J-^) ¥" bj, let Xj^m+i := 0. If f{x^) = bj, let 
Xj^M+i '■= 1 with probability aj/rj, and Xj^jv/+i := with the remaining probability, where this 
random decision is independent of all others. Thus in this case, 



E[Xj^m+i\um, . . .,u'Ij] = rj ■ {aj/rj) = aj < Xj^m, 

where the last inequality holds by the definition of W^{u\.^) since \u\j\ < T. 

Let U = {Uq, . . . , Um), where Ut := {u}, . . . ,u^). By an argument analogous to that in the proof 
of Theorem 1101 we verify that {X,U) obey the assumptions of LemmalU this time with pj := 1 — 2e 
(since Xj^ < 1 — 2e). Applying Lemma [9] to A := {A C [k] : \A\ > (1 — a£)k}, we find 

Pr[|{j : X,- M+i = 1}| > (1 - ae)k] < Pt[D G A], (5) 

where each j G [A;] is independently included in D with probability (1 — 2e). We have Pr[D G ^] = 
PrY^B,^,_,SY > {1 - ae)k]. 

We analyze events F of form F := [Um = {u\j, . . . , ti^j), Xi^jv/+i = zi, . . . , Xi^^m+i = ^k]- Note 
that conditioning on F does not condition on the particular values f{x^) which helped determine 
the values Zj. Focus attention on any such event F for which \{j : Xj^m+i = 1}| ^ (1 — ae)k. 
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Since D makes at most aeTk queries, there are fewer than aek indices j for which |u]^^| > T. In 
particular, there exists a j* € [k] for which \u-'j^.j\ < T and Xj*^m+i < 1 (so, by our definitions, 
Xj*^M+l = 0). 

Now let the event F' be defined just like F, except that F' makes no conditioning on Xj*^jv/+i 
(so, F = F'A [Xj*,M+i = 0]). Then, 

Pvlfix^*) = bj.\F] = Pr[/(x^'*) = bj*\F' A Xj.,M+i = 0] 
^ Prifix^*) = bj. A X,.,M+i = 0\F'] 
Pr[Xj.^M+i =0\F'] 

Pr[/(x^'*) = bj.\F'] ■ Pr[X,-.,M+i = 0\F', f{x^*) = bj*] 

Pi[f{xn = bj*\F'] ■ Pr[X,-.,M+i = 0\F', f{x^*) = bj*] + Pr[/(^i*) / bj*\F'] ■ Pr[X,*,M+i = 0|F', f{x^*) + bj* 

rj*{l-aj*/rj*) 
rj*{l - aj*/rj*) + (1 - rj*) ■ 1 

(using the fact that x^, . . . ,x^ are independent conditioned on Um: by Lemma[71 and the additional 
fact that {Xj,M+i}j^[k] are independent conditioned on Um) 



\{l + aj*)-aj* 



1 — aj* 1 — aj* 



1/2. 



Thus, f{x^*) is an unbiased random bit conditioned on F. Consequently, /®^(s^, . . . , x^) = f{x^*)(B 
yefc-i^gji^ . . . ,x^*~^,x^*^^, . . . ,x'^) is an unbiased random bit conditioned on F. Thus under this 
conditioning, P's output bit equals the fc-fold XOR with probability no more (and no less) than 
1/2. Now F was an arbitrary outcome of Um,Xi^m+i, ■ ■ ■ ,Xk^M+i for which \{j : Xj^m+i = 1}| < 
(1 — ae)k. It follows that 

Pr [V{x) = /®'=(x)] < Pr [\{j : X,- m+i = 1}| > (1 - ae)k] + ^ Pr [\{j : Xj,m+i = 1}| < (1 - c,e)k] 

= ^ (1 + Pr [\{j : X,,M+i = 1}| > (1 - o^e)k]) 

< Jfl + ^ Pr [y>(l-ae), 

using Eq. ([5]). 

Finally, to get the concrete bound claimed in the Theorem statement, first suppose e = 1/2; 
in this case the bound follows easily since y = with certainty. Now if e < 1/2, note that 
(1 - a£)k = (1 - (a/2)(2e)), and apply Lemma[5]with 5 ■.= 2£ <l and /3 := a/2 < 1/2. D 

8 Direct Product Theorems for Search Problems and Errorless 
Heuristics 

We define a fairly general notion of search problems in the query model for which a direct product 
theorem can be proved. As a corollary we will obtain a direct product theorem for errorless 
heuristics, defined in Section 18.21 

19 



8.1 Search problems 

We need some preliminary definitios. Given u,v ^ {0, 1, *}", say that u and v agree if Ui € {0, 1} 
implies Vi S {*,Mj}. Note that this definition is symmetric in u and v. If u,v agree, define their 
overlay uo v £ {0, 1, *}"■ by {u o v)i := 6 € {0, 1} if either Ui = b or Vi = b, otherwise (u o v)i := *. 
Say that u extends v ii Vi £ {0, 1} implies Ui = Vi. 

Say we are given a distribution /.i on {0, 1}", and a (possibly randomized) query algorithm TZ; 
if TZ runs on an input distributed according /_i, we denote by C/t^,^ € {0, 1, *}" the random string 
describing the input bits seen by TZ. 

A search problem is defined by a subset V C {0,1,*}". We say that 7?. e-solves the search 
problem V with respect to an input distribution /i over {0, 1}" if, with probability > 1 — e, C^7^,/i 
extends some v G V. (We allow the possibility that some x G supp(/i) do not extend any v £ V.) 
Define Suct,^(^) := 1 — £, where e is the minimal value such that some T-query randomized 
algorithm e-solves search problem V on inputs from fi. 

Define the fc-fold search problem y®'= := {{vi,...,Vk) : Vj G V,\fj G [k]} C {0,1,*}^'". Thus 
to solve V®'', an algorithm must solve each of the k constituent search problems. We generalize 
this notion in order to state a threshold DPT, which will imply our ordinary DPT. For a monotone 
subset A Q Vdk]), define 

V'^'-^ := {(^1, . ..,Vk): {j G [k] : vj €V} G A}. 

Thus to solve V ' , an algorithm must solve 'sufficiently many' of the k search problems, as specified 
by A 

Recall the notation Nr{-) from Section [6l Our generalized threshold DPT for search problems 
is as follows: 

Theorem 14. Suppose the search problem V satisfies SucT.fj,{V) < 1 — e. Then for any a G (0, 1] 
and any monotone A QV{[k]), 

where each j G [k] is independently included in D with probability 1 — e. 

Proof. In the search setting, e can potentially be any value in [0, 1]. The boundary cases are trivial, 
so assume < e < 1. As usual, we can assume that T > and supp(/i) = {0, 1}", and it is enough 
to bound the success probability of any deterministic aeTfc-query algorithm. 

Following Theorem [H we first develop some concepts related to a computation on a single 
input to the search problem V. For each u G {0,1,*}" for which \u\ < T, let Valy(ii) := 1 
if u extends some v € V, otherwise Val\/(n) := 0. For a deterministic query algorithm T> let 
WviujV) := E[Val(uof/^ („))]. (Note that u and Uj^ (u) always agree.) If |m| < T, let Wy{u) := 
maxx)(WV(M,'D)), where the maximum ranges over all deterministic algorithms making at most 
T — \u\ queries. In other words, Wy{u) is the maximum success probability of any (T — |ii|)-query 
algorithm in solving V on an input y ~ /^^"s where we reveal the bits described by u 'for free' to 
the algorithm. Then we have: 

Lemma 15. 1. VF^(*") < 1 - e. 

2. For any u G {0, 1, *}" with \u\ < T, and any i G [n], Ey^^(u)[Wy{u[xi ^ l/J)] < Wy{u). 
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We omit the proof, which is essentially the same as that of Lemma [6j 

Let T> be any deterministic algorithm making at most M := [aeTfeJ queries and attempting 
to compute V^"^ on inputs x^, . . . ,x^ ~ fi^^. For < t < M, and for j e [k], let uj be defined 
as in the previous proofs. Let X = {Xj^t}j£[k] o<«A/) where Xj^t '■= Wylul) if \ul\ < T, otherwise 

Unlike in Theorem 1 101 we have no need to add any additional steps to our random sequences. 
For < i < M, we let Ut := {uj, . . . ,u^) just as before. Setting N := M and reasoning as in 
Theorem [lOl we verify that the assumptions of Lemma [9] are satisfied, with pj = Xj^q < 1 — e 
(Lemma 1151 part 1). 

Applying Lemma [9] to the monotone set Naeki^)^ we conclude that 

Pr[{i G [k] : X,^M = 1} G N^ek{A)] < Pt[D G N^ek{A)], (6) 

where each j G [k] is independently included in D with probability 1 — e. 

Now condition on any execution of T>, and consider any j G [k] such that Xj^m < 1- By our 
definitions, there are two possibilities: either |tt]^^| > T (there are fewer than aek such indices j), or 
u]y- does not extend any v £ V. Thus if V solves the search problem V ' on the present execution, 
we have {j G [k] : Xj^m = 1} G Naski^^)- Combining this with Eq. ^ yields the Theorem. D 

From Theorem [T^ we directly get an standard threshold DPT and an ordinary DPT for search 
problems. First, given a search problem V C {0,1,*}" and a real number s G [0,k], define 
C[> s] := {A C [k] : |yl| > s}. 

Theorem 16. Suppose Suct,^{V) < 1 — e. Then for any a G (0, 1] and any r] G (0, 1], 

Suc^eTk,>.^^iV'^'^^-'"'^) < ^ Pr [Y>{v- ae)k] . 



Y^B 



fc,l-e 



Proof. Apply Theorem 1141 with C := C[> rjk], and note that D G N^sk (C[> rjk]) iff \D\ > r]k — aek, 
which is equivalent to [Di + . . . + Dk > {v — Q:e)k], where Dj := l[jeD]- These indicator variables 
are independent with expectation 1 — e. D 

Theorem 17. Suppose Suct,^(^) < 1 — £. Then for any a G (0, 1], 

Snc^eTk,M^''') ^ V ?^ [>^ > (1 - «e)^] • 
Proof. Note that V^'^ = V''''^^-''\ so the result follows from Theorem [T6l with rj := 1. D 

8.2 Errorless heuristics 

An errorless heuristic for a (non-necessarily Boolean) function / is a randomized query algorithm 
7?. outputting values in {0,1,?} such that for all x, TZ{x) G {/(x),?} with probability 1. We say 
that an errorless heuristic TZ e-solves f with zero error with respect to input distribution // if 
'PT:xr^^\TZ{x) = f{x)\ > 1 — e. Let Suc5-'![''(/) := 1 — e, where e is the minimal value such that some 
T-query errorless heuristic e-solves / with zero error with respect to /i. Note that Suc5^^'^(/) is 
exactly Suct,^(F/), where the search problem Vf is defined as Vf := {u G {0, 1, *}" : u forces the 
value of /}. Also, note that Vj®* = V? . Thus the following result is immediately implied by 
Theorem [T71 
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Theorem 18. Suppose Suc^^^^f) <l-e. Then for a G (0, 1], 

S<;'^Tfc,,«.(0<^ Pr [Y>{l-ae)k]. 

Note that an errorless heuristic to compute the fe-fold XOR /®^' cannot produce any output 
other than '?', unless it has succeeded in determining f^'^. Thus Theorem 1181 also implies an XOR 
lemma with the same success bound for errorless heuristics. 

Next we prove a worst-case analogue of Theorem 1181 Define Ro{f), the zero-error randomized 
query complexity of f, as the minimum T for which some algorithm 7^ outputs f{x) with probability 
1 for each x, and for which the expected number of queries made by 7^ to any input is at most 
T. The following is another variant of Yao's minimax principle |Yao77j : we include a proof for 
completeness. 



Lemma 19. Let r] G (0, 1]. There exists a distribution fi^ over inputs to f , such that 

„0-crr , 



^<Zn.uSf)<^- 



Proof. Consider the following 2-player game: player 1 chooses a (possibly randomized) errorless 
heuristic TZ for / which makes at most r]RQ(f) queries, and player 2 chooses (simultaneously) an 
input X to /. Player 1 wins if TZ{x) = f{x). We claim there exists a randomized strategy for 
player 2, that is, a distribution fj, =: firj over inputs to x, that beats any strategy of player 1 with 
probability at least 1 — rj. This will prove the Lemma. 

To prove the claim, suppose for contradiction's sake that no such strategy for player 2 exists. 
Then, by the Minimax Theorem, there exists a randomized strategy for player 1 which wins with 
probability greater than rj against all choices of x. This strategy is itself a randomized algorithm 
making at most ryi?o(/) queries; let us call this algorithm 7^. Consider the algorithm TZ' for / that 
on input x, repeatedly applies 7^ to rr until TZ produces an output, which TZ' then outputs. Then 
lZ'{x) = f{x) on every input. Also, the expected number of queries of TZ' on any input is strictly 
less than 

j; (1 - iir-\ {m ■ rjRoif)) = ( E (1 - ^r^'"') ■ v'Roif) 

m>l m>l 

= ^-rfRo{f) 

= Ro{f), 

contradicting the definition of Ro{f). D 

Theorem 20. For a G (0, 1/2], any errorless heuristic for f® using at most a'^Ro{f)k/A queries 
has worst-case success probability less than [22aln(l/a)] . 

Proof. Set 7 := a/2. Let fi^ be the distribution given by Lemma [lU so that Sue "^"^rx (/) < 7. 
By Theorem [T8l applied to a, with T := 7i?o(/) and e := 1 — 7, 

Suc°-;" ^ n ..^^. mif^'') < Pr [Y > {1 - a(l - -f))k] . 
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We have a'^Roif)k/4: < a(l - 'y)jRo{f)k (using 7 < 1/2), so that 

< [1 - (1 - 7) + 21q ln(l/a)(l - 7))]^ 

(applying Lemma[5l with f3 := a < 1/2 and 5 := {1 — 7)) 

< [a/2 + 21aln(l/a)]^ 

< [22ahi(l/a)]^ 



D 



9 A Direct Product Theorem for Decision Tree Size 

We measure the size of a decision tree T>, denoted size(P), as the number of leaf (output) vertices. 
Note that this is at least 1/2 the total number of vertices. Define Suc|^™(/) as the maximum 
success probability of any size-T decision tree attempting to compute / on an input drawn from 
distribution fi. We have the following DPT for size-bounded query algorithms: 

Theorem 21. Suppose Su4^^{f) < 1 - e. Then forO<a<l, SucpJ'.^^^^^if^'') < 2""*^(1 - e)^ 

Note how the size bound grows exponentially, rather than linearly, in k in the above statement. 
Also note that, by convexity. Theorem 1211 also bounds the success probability of any 'randomized 
size-T"^'^ algorithm' TZ, i.e., of any probability distribution over size-T"^'^ decision trees. 

Proof. The proof follows that of Theorem[Tl except that we need a new way to quantify the resources 
used by each of the k inputs. First we develop some definitions pertaining to a single input to /. 
As in Theorem m let W{u,V) := Prj^_^{„) [D(y) = /(y)]. Given a real number Z G [l,r], let 
VFg*2g(n, Z) := max© W{u,'D), where the maximum is over all decision trees T> of size at most Z. 

Lemma 22. 1. VF4J*",r) <l-e. 

2. Take any real numbers S, S^^' , S^^' > 1 for which S = S^^' + S^^' . Then for any u G {0,1,*}"^ 

and any i € [n], 

The proof is very similar to that of Lemma El and is omitted. 

Now let T) be any deterministic algorithm of size at most T"''^^ attempting to compute f^^ on 
input strings X = {x^, . . . ,x^) ~ /i*^^. Let M := [T^'^'^J ; V always makes at most M queries. 

As in previous proofs, for j € [k] and < t < M, let ul S {0, 1, *}" describe the outcomes of 
all queries made to x^ after T> has taken t steps (where a 'step' consists of a query, unless T) has 
halted, in which case a step has no effect). 

Let St be defined as the size (number of leaf vertices) of the subtree of T) reached after t steps 
have been taken (so, ^j = 1 iff P has halted after at most t queries). For each j € [A:], we define a 
sequence Zj^q, . . . , Zj^m-, as follows. Let Zj^ := T. For < t < M, if V has halted after t steps, let 
Zj^t+i '■= Zj^f Otherwise, if the (t + l)-st query made by T> is not to x\ we again let Zj^t+i '■= Zj^f 
If the {t + l)-st query is to x^ , let Zj^t+i '■= -^^ • Zj^f 
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Let Xj^t := W*^^{u^j,Zj^t) if ^i,t > 1; otherwise let Xj^t ■= Xj,t-i- Let Pt := Ujelk]^j,t- 
Arguing as in Theorem [H for each < t < M, E[Pt+i] < E[Pt]. It foUows that E[Pm] < lE[Po] = 

W*,e(*">r)'<(l-^)'- 

Condition on any complete execution oiV, as described by u\j, . . . , u\^. Notice that if Zj^m > Ij 
then (by the definitions) Xj^m is an upper bound on the conditional success probability of guessing 
f{x^) correctly. Also, Xj^t > 1/2 for all j,t, and all inputs are independent after our conditioning. 
Thus the conditional success probability of computing f®^(x) is at most 2'^'Pm, where we define 
the (random) set S := {j G [k] : Zj^m < !}• 

Observe that Sm = 1) since the algorithm halts after at most M steps. Then, 

^ rp—\B\ rpaek 

Thus, \B\ < aek always. So the overall success probability is at most E[2l 'Pa/] < 2"^ E[Pjv/] < 
(2"^(l-e))^ D 

One can also prove variants of our XOR lemma and other results in which we impose bounds 
on decision tree size rather than number of queries. We omit the details. 

10 DPTs for Dynamic Interaction 

So far, all of the computational tasks we have studied have involved algorithms querying a collection 
of fixed input strings. However, in many situations in computer science it is natural to consider 
more general problems of interaction with dyamic, stateful entities. An algorithm can still 'query' 
these entities, but these actions may infiuence the outcomes of future queries. In this section we 
describe how our proof methods can yield DPTs for these more general problems. The methods 
involved are essentially the same as in previous sections, and the theorem we give is just one example 
of the kind of DPT we can prove for dynamic interaction, so we will only sketch the proofs here, 
indicating the novel elements. 

To make our scenario concrete, we first define the type of entity with which our query algorithms 
interact. Define an interactive automaton (I A) as a 5-tuple 

A4 = (seeds, states, queries, i?. A), where: 

• seeds, states, queries are each finite sets, and states contains a distinguished start state sq; 

• R : seeds x states x queries — )• {0, 1} is a response mapping; 

• A : seeds x states x queries — > states is a transition mapping. 

We consider the scenario in which M is initialized to some seed z € seeds according to a 
distribution //, along with the start-state sq. The automaton retains the value z throughout an 
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interaction with a query algorithm TZ (which does not know the value z), but changes its state- 
value. If 7^ selects the query q & Q while M has internal state {z, s) € seeds x states, then A4 
returns the value R{z, s, q) to TZ and transitions to the state (z, A(z, s, q)). 

There are several kinds of tasks one can associate with an lA. One such task for the query 
algorithm 7^ is to try to output a value b a B that satisfies some predicate P{z, b), where z is the 
seed to Ai and P C seeds xi? is a total relation over seeds and a finite set B. This, of course, 
is a generalization of the relation problems we studied in Section [6l and it is natural to study the 
k-fold setting, in which 7^ interacts with k lAs, querying one of them at each step. We assume that 
each lA only updates its state or sends a response to 7^ when it is queried. In particular, the lAs 
do not communicate with each other. 

We can transform the lA interaction scenario into an equivalent one which highlights the sim- 
ilarity with the standard query model, and makes it easy to apply our previous work to state 
and prove a DPT. For simplicity assume | seeds | = 2™. Given an lA Ai and an integer A^ > 0, 
for each z € seeds we define a string ^{z) € {Q, Ij^+d 'i"'''''''® 1+^) . There are two types of en- 
tries in this string. First there are m 'ID' entries, which simply contain a binary encoding of 
z. Next there are (| queries | -|- 1) 'response' entries, with each such entry indexed by an A^- 
tuple q = {qi, . . . ,qN) G (queries U{*}) . We are only interested in response-entries of form 
q = {qi, . . . ,qr,*,*, . . . , *), where qi, . . . ,qr € queries. For such an entry we define (,{z)g € {0, 1} 
as the result of the following experiment: initialize A4 to state {z, sq), and perform the interaction 
in which a query algorithm asks queries gi, . . . , g,- in that order. Let ^{z)q be the final, r-th response 
made by ^A. 

Define a total relation P^ C {0, Ij'^+d ''"''"''" l+i)'^ x S by 

p^ ■= {{^{z),b) : z G seedsAP(z,6)}. 

Also, given a distribution fi over seeds, define ^^ ~ ^(•z), where z ~ /.i. In this way we map an 
lA interaction task onto a relation problem of the type studied in Section [H with a corresponding 
map from initialization distributions to input distributions. 

A standard query algorithm 7^ (as studied in all previous sections) can faithfully simulate an 
interaction with A4 initialized to an unknown z S seeds, if given query access to S,{z). This works 
in the natural way: if its simulated queries up to the r-th step are qi, . . . ,qr, then for its r-th query 
to ^{z), TZ looks at the entry (qi, . . . , qr, *,*,...,*) to learn A^'s r-th response. Call an algorithm 
'interaction-faithful' if its sequence of queries to any input string always obey this format. 

Obviously, not all algorithms are interaction-faithful. For example, an unfaithful algorithm 
could simply look at the ID-entries to learn z. Thus the relation problem (P^, /i^) can be much easier 
than that of the lA interaction problem defined by {A4,P, fj,). However, if we restrict attention to the 
class of interaction-faithful algorithms TZ, then it is easy to see that there is an exact correspondence 
between the 'difficulty' of the two problems, at least for interactions lasting at most A^ steps. That 
is, for T < N, there is a T-query lA-interaction algorithm for (A4,P, /u) with success probability 
p, if and only if there is a T-query interaction-faithful standard algorithm for (P^,/_ig) with success 
probability p. 

The good news is that we can prove a DPT for interaction-faithful query algorithms in almost 
exactly the same way as for unrestricted query algorithms. In fact, it's most natural to prove a 
DPT for a more general notion of faithfulness, which we define next. Say we are given n > and a 
map T : {0, 1, *}" — )> {0, 1}", called a query-restriction map. Say that a (standard) query algorithm 
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TZ on n input bits is t -faithful if for every execution of TZ on any input, whenever the input bits 
seen by 7^ seen so far are given by u G {0, 1, *}", TZ either halts, or chooses a next input bit Xj to 
query whose index satisfies t(w)j = 1. In other words, a restriction map r restricts the possible 
queries which can be made by a r-faithful algorithm, in a way that depends only on the description 
u of the bits seen so far. Note that inter action- faithfulness as defined earlier is indeed equivalent 
to T-faithfulness for an appropriately defined r = Tint- 

For k > 1, define the k-fold product of restriction map r, denoted r'^'^ : {0,1}'^" — )> {0, l}'^", 
by T® (u"^, . . . ,n ) := {t{u^), . . . ,t{u )). The map r® can be interpreted as a restriction map 
for algorithms making queries to k n-bit strings. Note that TZ is r'^'^-faithful exactly if for each 
j G [k], TZ''s queries to the j'-th input (considered alone) are always r-faithful. Thus, the /c-fold lA 
interaction problem defined by {Ai,P,fi) has 'difficulty' equivalent to the /c-fold relation problem 
defined by {P^,/i^) for t^^ -faithful algorithms, provided N is chosen large enough in the definition 
of ^(•) (relative to the query bounds we are interested in). 

In light of these observations, a DPT for lA interaction algorithms follows by straightforward 
translation from the following DPT (generalizing Theorem I lOp for standard query algorithms obey- 
ing a restriction map. 

Theorem 23. Let P C {0, 1}" x B be a total relation such that any T-query, r-faithful algorithm 
solves the relation problem for P with probability at most 1 — e under input distribution /i. 

For any algorithm TZ making queries to inputs x = (x^, . . . ,x ) ~ fi"^ and producing output in 
B , define the random set S[x] as in TheoremlKA 

Suppose TZ is t'^^ -faithful and aeTk- query-bounded for some a G (0, 1], and A is any monotone 
subset ofT'dk]). Then conclusions 1 and 2 in Theorem [1^ also hold for the present setting. 

Proof. (Sketch) The proof follows that of Theorem (TUJ we only describe the differences. For u G 
{0, 1, *}", and for a deterministic algorithm P on n input bits, let Wp{u, V) := Pr {„) [(y, T){y)) G 
P] as in Theorem [TOj Also, say that T> is u-inducing if, on any input x G {0, 1}" which extendqj 
w, the outcome of P's first \u\ queries to x are described by u. 

If \u\ < T, define Wp^{u) := iQaxx>Wp{u,T?), where the max ranges over all deterministic, 
ti-inducing, r-faithful algorithms T> making at most T queries. We have: 

Lemma 24. 1. VFp^^(*") < 1 - e. 

2. For any u G {0, 1,*}" with \u\ < T, and any i G [n], ^y^f,(u)[Wp^^{u[xi ^ yj)] < Wp.^{u). 

The rest of the proof follows Theorem [T0| with Wp^{u) taking the place of Wp{u). D 

One can also prove a DPT for search problems for r-faithful query algorithms, along the lines 
of Theorem [lH When applied to interactive automata via the translation described earlier, search 
problems correspond to tasks whose success conditions are defined in terms of the interaction itself 
(rather than the hidden seed of the lA, or any output produced by the query algorithm). 

11 Questions for Future Work 

• Can the bounds in our threshold DPTs and XOR lemma be improved? For the threshold 
DPTs, how does the tightness of these bounds depend on the monotone set A? 



'(as defined in Section r8.1[) 
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• It is still unknown what worst-case success probability in computing f^^ can be achieved in 
general, when the number of queries allowed is aR2{f)k for a ^ 1. (As mentioned earlier, 
0{R2{f)k log k) queries always suffice to achieve high success probability.) The corresponding 
question in the quantum query model was settled by Buhrman et al. |BNRdW05| . 

Our Theorem [1] would help resolve this question, if we could identify a function / and dis- 
tribution /x for which SuCq,^2(/),m(/) approaches 1 not-too-quickly as a grows (say, not faster 
than 1 — 2^^^'^', for a in a reasonable range). The AND/OR-tree evaluation problem, whose 
randomized query complexity was studied in [SW86J . might be a good candidate. 

• Can our direct product theorem be extended to the quantum query model? The main difficulty 
is that the natural analogue of the conditional-independence property we used in Theorem[T]is 
false for quantum query algorithms. The close connection shown by Reichardt |Rei09j between 
quantum query complexity and span programs may be useful in pursuing this question. 

• Can the ideas in this paper help improve our understanding of the direct product problem in 
the communication and circuit models, or other computational settings? 
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A Proof of Lemma [5] 

We use a general form of Chernoff 's inequality: 

Lemma 25 ( |DP09] . §1.3). Suppose Y ~ Bk,p, with q := 1 - p. Then for t G [0, q), 

Pr[r>^ + <)*=l<((^)"'(^)'"y. 

To prove Lemma O apply Lemma [25] with t := (1 — (3)6. We have 
Pr [y > (1 - (36)k] = Pr[Y > ((1 - 6) + {1 - P)6)k] 

< ■ ' 



l-(36 
Now (36 < 1/2 implies (1 - (35)"'^ < (1 + 2(36). Also 



l-(36j \6-{l-(3)6 

1 6 \ 1^^^ \ ^ 



f^-pS ^ g/31n{l//3)5 < f2f5\n{l/fi)&\ 



1-/35 



since 1 — (36 > 1/2. By our remarks on the function x ln(l/x) in Section [2.21 we have 2/3 ln(l//3)(5 < 
2e~^6 < 1. Now we claim that e^ < 1 + (e — l)x < 1 + 1.8x for x € [0,1]. To see this, just 
note that e^ is convex on M, that e° = 1, and that e^ = 1 + (e - 1) • 1. Thus, (^e^l^H^/ls)^^^'^^ < 
(1 + 1.8 • 2(3ln{l/(3)6)^^^^ . Plugging in these observations, 

Pr[F> {l-(36)k] < [{1 - 6){l + 2(36){l + 3.Q(3 ln{l/ (3)6)]^^-^^^'' 

< [(1 - 6){1 + 3/31n(l//3)5)(l + 3.6(3 ln{l/P)6)f~^^^'' 

< [{l-6)il + 11(3 ln{l/ (3)6) f-^^'>'' 

< [l-6 + ll(3ln{l/(3)6]^^~'^^^'' 

[l-6 + ll(3ln{l/(3)6f 



[l-<5 + ll/31n(l//3)<5]'^^^'' 
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(7) 



We bound the denominator in Eq. ([7]) by splitting into cases. First assume 5 < 1/2. In this 
case we bound 

[1-6 + 11/3 ln{l/ I3)6f^^ > (1 - 6f^^ 

^ 2-l3Sk _ -(ln2)/35fc 

Next assume 6 > 1/2. In this case we bound 

[1-6 + 11/3 ln{l/p)6f^'' > {7p6f^^ 

^ g-/351n(l/{7/35))fc ^ g-/351n(l//3)fc_ 

In either case, we conclude that Eq. ([7]) is less than 

(1 - 5 + ll/31n(l//3),5) e^^i-^^V/?)]*^ < [(i _ 5 + ll/31n(l//3)5) (1 + 1.8/3 ln(l//3)<5)]*^ 



< [l-(5 + 21/31n(l//3)5] 



fc 



B Proof of Lemma [7] 



Fix any j G [k] and consider any assignment {x^ )j'&\k]\{j} °^ values x-' G {0, 1}" to the inputs other 
than the j'-th input, where x^ extends u^ for each j' ^ j. We show that, after conditioning on 
the query outcomes u\, . . . ,u^ and on the event [x^ = x^ Vj' ^ j] , the j'-th input x^ is distributed 
according to fi^^^'. This will prove the Lemma. 

Consider each y G {0, l}" which extends ul- Now uj, . . . ,u^ are, by assumption, a possible 
description of the first t queries made by V under some input. Since T? is deterministic, and 
{x^, . . . , x^~^, y, x^^^, . . . , x^) are consistent with {uj, . . . , u^), we conclude that {uj, . . . , u^) also 
describe the first t queries made by V on [x^, . . . ,x^~^,y,x^~^^, . . . ,x ). Thus the conditional 
probability that x-' = y is 

^L'^''{x\...,xJ~\y,x^+\...,x^) _ Ky)-Uj'^jKx^') 



^z extends ul A^^'^l^'. • • - ^^' '> ^> ^^'^^ • • - ^'') E, extends «^ ^(^) ' Uj'^j K^'' 

^—-'2 extends Mf ^^ '^ 

by definition of /i^"*'. This completes the proof. 
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